Promethuse Recording Rules 优化实践
· 阅读需 5 分钟
随着时间的推移,Prometheus 中存储的指标数量越来越多,查询的频率也越来越高。随着越来越多的仪表板被添加到 Grafana,我开始遇到 Grafana 无法按时呈现图形和 Prometheus 查询超时的情况,尤其在长时间聚合大量指标时这种现象更为严重。
本文用到了 Prometheus Recording Rule 实现对高维度指标查询的 PromQL 语句的性能优化,提高查询效率。
规则的语法检查
要在不启动 Prometheus 服务器的情况下快速检查规则文件的语法是否正确,请安装并运行 promtool 。
go get github.com/prometheus/prometheus/cmd/promtool
promtool check rules /path/to/example.rules.yml
# 如果文件语法正确,检查器将解析的规则作为文本打印到标准输出,并以返回码 0 退出。
# 如果存在任何语法错误或输入参数无效,则将错误消息打印到标准错误并以返回码 1 退出。
Recording rules
Recording rules 允许您预先计算经常需要或计算量大的表达式,并将结果保存为新的时间序列。对预先计算的结果进行查询通常比每次需要时都执行原始表达式要快得多。这对于每次刷新时需要重复查询相同表达式的仪表板特别有用。
一个简单的示例规则文件可能如下所示:
alerting_rules.yml
groups:
- name: example
rules:
- record: job:http_inprogress_requests:sum
expr: sum(http_inprogress_requests) by (job)
例如我们把下面 Grafana 中的规则改为 Recording rules
sum (rate (container_network_receive_bytes_total[5m]))by (node)
sum (rate (container_network_transmit_bytes_total[5m])) by (node)
sum (rate (wmi_container_network_receive_bytes_total[5m]))by (node)
sum (rate (wmi_container_network_transmit_bytes_total[5m]))by (node)
编写对应的 prometheus rule 配置文件
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
generation: 1
labels:
app: exporter-kubernetes
chart: exporter-kubernetes-0.0.1
heritage: Tiller
io.cattle.field/appId: cluster-monitoring
release: cluster-monitoring
source: rancher-monitoring
name: custom
namespace: cattle-prometheus
spec:
groups:
- name: network_IO
rules:
- record: custom_container_network_receive_bytes_total
expr: sum (rate (container_network_receive_bytes_total[5m]))by (node)
- record: custom_container_network_transmit_bytes_total
expr: sum (rate (container_network_transmit_bytes_total[5m])) by (node)
- record: custom_wmi_container_network_receive_bytes_total
expr: sum (rate (wmi_container_network_receive_bytes_total[5m]))by (node)
- record: custom_wmi_container_network_transmit_bytes_total
expr: sum (rate (wmi_container_network_transmit_bytes_total[5m]))by (node)